feat(errors): structured what/why/fix errors + stable exit codes (error-system v2) by alex-mextner · Pull Request #27 · alex-mextner/rig-cli

alex-mextner · 2026-06-16T21:35:18Z

error-system v2 — structured what/why/fix errors + stable exit codes

Born from two same-day prod failures whose errors were thin and undiagnosable:
unknown mcp item(s): review (known: none) (no hint it was a removed slot) and a
dead hook path surfacing only as a generic harness "PreToolUse error". This makes every
rig error a consistent WHAT / WHY / HOW-to-fix block with a stable per-class exit
code (a public contract, documented in rig --help).

What's in

riglib/errors.py — RigError(what/why/fix/exit_code) + per-class codes
(2 config, 3 drift, 4 unknown-item, 5 missing-target, 6 not-a-repo,
127 missing-dep), a single top-level guard() renderer, did-you-mean
(Levenshtein-nearest), and a removed-slot registry (mcp.items.review → names
agent-tools feat(status): area-coverage summary — every reconciled area, not "mostly skills" #32 + the exact fix instead of "known: none").
riglib/layers.py — GLOBAL vs REPO classification, single source of truth so
rig status groups drift by layer and names which config file declares each item.
riglib/missing_target.py — proactively scans the harness settings.json for
hook commands pointing at a script gone from disk (the dead-rtk-hook case). Resolves
the invoked script (skips bare/absolute interpreters like /usr/bin/env,
/opt/homebrew/bin/python3; honors -c only as an interpreter flag), checks only that
token so a runtime output arg never false-flags.
cli.py — 3-part error rendering + layer-grouped rig status; a non-git dir
shows not a git repository — repo layer N/A (never the "should be committed" nag) and
degrades gracefully when the agent-tools catalog can't be resolved. rig doctor honors
the documented 127 missing-dependency exit code.
plan.py raises the structured UnknownItemError; config.py adds
primary_config_path for error provenance.

Verification

uv run --with pytest python -m pytest tests/ → 504 passed.
bash tests/smoke.sh → fully green (incl. the non-git, clean-sample, removed-slot,
and dead-hook legs). Fixed a latent smoke bug: the non-git test dir was nested under the
smoke's own git init-ed $TMP, so it wasn't actually non-git.
Reviewed across models with review --staged (Opus / Codex); their findings on the new
modules were addressed (HOME-isolated the new status tests; tightened the missing-target
script resolution for absolute interpreters and -c).

Known follow-ups (out of scope for this PR; CTO to triage)

Per-key config provenance: primary_config_path names the repo file even when the
bad key came from the global layer.
Exit-code class precedence: when rig status has both drift and a dead target, it
returns 3 (drift) and the 5 (missing-target) signal is masked.
The new structured errors.ConfigError exists but config.ConfigError is still what's
raised for malformed config (so YAML errors still render the old one-line form).
The non-git CatalogError fallback skips the missing-target scan (a dead ~/.claude
hook is surfaced in normal status but not in the non-git + no-catalog path).

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4de60aaa3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Three deferred P2 findings from the PR #27 codex review, now fixed properly: 1. config.py — per-key LAYER PROVENANCE for unknown items. The loader now tracks which config FILE last declared each top-level key (key_sources, built in cascade order). An unknown item that came SOLELY from the global config is reported against ~/.config/rig/config.yaml, not the repo rig.yaml. plan.py's unknown-item errors resolve the source via config.source_for_key(). 2. missing_target.py — SKIP `-m module` invocations. `python3 -m my_hook --out x` runs a module by import name, not a script FILE; the scanner now treats `-m` like `-c` (interpreter mode, no path to verify), so a healthy module-based hook with an absolute --out arg no longer false-flags as a missing target. 3. cli.py — EXIT-CODE PRECEDENCE. A dead hook/target now surfaces its exit code (EXIT_MISSING_TARGET=5) even alongside config<->disk drift, instead of being masked by the drift exit (3). Both findings are still printed; missing-target outranks drift (it fails at runtime). Precedence documented in `rig --help`. Also makes tests/smoke.sh run pytest via `uv run --with pytest` when uv is available (the documented command), so the suite runs on a machine whose bare python3 lacks pytest. Tests: RED-first for each finding; full suite 504 -> 513 passing; smoke green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…or-system v2) Born from two same-day prod failures whose errors were thin and undiagnosable: `unknown mcp item(s): review (known: none)` (no hint it was a REMOVED slot) and a dead hook path surfacing only as a generic harness "PreToolUse error". This makes every rig error a consistent WHAT / WHY / HOW-to-fix block with a stable per-class exit code (a PUBLIC CONTRACT documented in `rig --help`). What's in: - riglib/errors.py: RigError(what/why/fix/exit_code) + per-class codes (2 config, 3 drift, 4 unknown-item, 5 missing-target, 6 not-a-repo, 127 missing-dep), a single top-level `guard()` renderer, did-you-mean (Levenshtein), and a removed-slot registry (mcp.items.review -> names agent-tools #32 + the fix). - riglib/layers.py: GLOBAL vs REPO classification — single source of truth so `rig status` groups drift by layer and names WHICH config file declares each item. - riglib/missing_target.py: proactively scans the harness settings.json for hook commands pointing at a script gone from disk (the dead-rtk-hook case). Resolves the invoked SCRIPT (skips bare/absolute interpreters like /usr/bin/env, honors `-c` only as an interpreter flag) and checks only that token, so a runtime output arg never false-flags. - cli.py: `rig status` renders the 3-part errors + layer grouping; a non-git dir shows "not a git repository — repo layer N/A" (never the "should be committed" nag), and degrades gracefully when the agent-tools catalog can't be resolved. plan.py raises the structured UnknownItemError. config.py adds primary_config_path for provenance. `rig doctor` honors the documented 127 missing-dependency exit code. Tests: 504 pytest passing; tests/smoke.sh fully green (incl. the non-git + dead-hook + removed-slot legs). New: test_errors, test_missing_target, test_plan_errors, test_status_layers; smoke gains non-git / clean-sample / removed-slot assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Three deferred P2 findings from the PR #27 codex review, now fixed properly: 1. config.py — per-key LAYER PROVENANCE for unknown items. The loader now tracks which config FILE last declared each top-level key (key_sources, built in cascade order). An unknown item that came SOLELY from the global config is reported against ~/.config/rig/config.yaml, not the repo rig.yaml. plan.py's unknown-item errors resolve the source via config.source_for_key(). 2. missing_target.py — SKIP `-m module` invocations. `python3 -m my_hook --out x` runs a module by import name, not a script FILE; the scanner now treats `-m` like `-c` (interpreter mode, no path to verify), so a healthy module-based hook with an absolute --out arg no longer false-flags as a missing target. 3. cli.py — EXIT-CODE PRECEDENCE. A dead hook/target now surfaces its exit code (EXIT_MISSING_TARGET=5) even alongside config<->disk drift, instead of being masked by the drift exit (3). Both findings are still printed; missing-target outranks drift (it fails at runtime). Precedence documented in `rig --help`. Also makes tests/smoke.sh run pytest via `uv run --with pytest` when uv is available (the documented command), so the suite runs on a machine whose bare python3 lacks pytest. Tests: RED-first for each finding; full suite 504 -> 513 passing; smoke green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ixture #27's clean-sample leg enumerates every default-ON category and disables it to assert zero false drift. tg_ctl (added by this PR) is default-ON, so its provision_tg_ctl action made the clean sample report drift (exit 3). Mirror how #29 added 'gitignore: {enabled: false}' and disable tg_ctl too.

* feat(rig): add tg_ctl config block + pure plist planning Add the tg_ctl config block (validate + plan) and the pure, effect-free TgCtlPlan that renders the ai.hyperide.tg-ctl.plist LaunchAgent XML byte-exact to the working hand-created file (sort_keys=False preserves the insertion order so a re-apply is a true no-op). Default-on, per-machine (GLOBAL layer), macOS-only. Mirrors the tmux block's schema style. boot:null and label:null resolve to their defaults (not bool(None)=False / str(None)="None"). Reviewed via multi-model `review`; findings addressed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(rig): provision + reconcile tg-ctl boot LaunchAgent Runner: _do_provision_tg_ctl writes the byte-exact plist, backs up a differing prior, ensures the log dir, tears down the stale predecessor (com.ultra.codex-tg-bot: bootout + timestamped backup + remove), and (re)loads via launchctl bootout/bootstrap in the gui/<uid> domain. A re-apply against the already-correct loaded plist is a skipped no-op. RIG_TG_CTL_DRY_RUN writes the plist but skips every live/destructive mutation (launchctl AND the stale teardown) so tests/smoke never touch the real launchd domain. Drift: _check_tg_ctl flags missing / divergent / written-but-not-loaded, a leftover plist when boot:false, and the stale predecessor (extra). CLI: GLOBAL status line shows installed / drifted / disabled / unsupported (off-darwin), resolved through the shared plan builder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(rig): tg-ctl unit suite + HOME-isolated smoke leg test_tg_ctl.py mirrors test_tmux.py: config validation, byte-exact plist render (incl. against the live machine plist when present, read-only), create/idempotent/conflict/dry-run states, stale-predecessor teardown, drift (missing/modified/extra/not-loaded), status states, and the boot:null / label:null / dry-run-no-stale-removal / off-darwin regressions. conftest neutralizes the default-on tg_ctl provisioner + drift check and stubs the gui-domain launchctl seams suite-wide (dedicated tests restore the real ones with their own HOME-isolated tmp dirs); no test ever touches the real ~/Library/LaunchAgents or runs real launchctl. smoke.sh gains a focused, HOME-isolated, RIG_TG_CTL_DRY_RUN tg-ctl leg and prefers `uv run --extra test pytest`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(rig): document the tg_ctl config block docs/config-schema.md: the tg_ctl section (keys, defaults, the byte-exact no-op contract, gui-domain (re)load, stale-predecessor teardown, drift, the RIG_TG_CTL_DRY_RUN seam, and the enabled:false vs boot:false distinction) + the validation paragraph. AGENTS.md: refine the "never mutate a LIVE service" rule — the stateless background daemons (models cron, tg_ctl) are the documented (re)load exceptions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(rig): disable tg_ctl in the error-system-v2 clean-sample smoke fixture #27's clean-sample leg enumerates every default-ON category and disables it to assert zero false drift. tg_ctl (added by this PR) is default-ON, so its provision_tg_ctl action made the clean sample report drift (exit 3). Mirror how #29 added 'gitignore: {enabled: false}' and disable tg_ctl too. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread riglib/config.py

Comment thread riglib/missing_target.py Outdated

Comment thread riglib/cli.py Outdated

alex-mextner mentioned this pull request Jun 16, 2026

feat(rig): provision .claude/worktrees ignore via global core.excludesfile (supersedes #23) #29

Merged

alex-mextner and others added 2 commits June 17, 2026 06:53

alex-mextner force-pushed the error-system-v2 branch from 3e07a9c to fe2a6a5 Compare June 17, 2026 04:54

alex-mextner merged commit 4aff90f into main Jun 17, 2026
12 checks passed

alex-mextner deleted the error-system-v2 branch June 17, 2026 04:55

alex-mextner mentioned this pull request Jun 17, 2026

feat(status): area-coverage summary — every reconciled area, not "mostly skills" #32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(errors): structured what/why/fix errors + stable exit codes (error-system v2)#27

feat(errors): structured what/why/fix errors + stable exit codes (error-system v2)#27
alex-mextner merged 2 commits into
mainfrom
error-system-v2

alex-mextner commented Jun 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alex-mextner commented Jun 16, 2026

error-system v2 — structured what/why/fix errors + stable exit codes

What's in

Verification

Known follow-ups (out of scope for this PR; CTO to triage)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant